Learning Chinese Word Embeddings With Words and Subcharacter N-Grams
نویسندگان
چکیده
منابع مشابه
Exploring Word Embeddings and Character N-Grams for Author Clustering
We presented our system for PAN 2016 Author Clustering task. Our software used simple character n-grams to represent the document collection. We then ran K-Means clustering optimized using the Silhouette Coefficient. Our system yields competitive results and required only a short runtime. Character n-grams can capture a wide range of information, making them effective for authorship attribution...
متن کاملJoint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components
Word embeddings have attracted much attention recently. Different from alphabetic writing systems, Chinese characters are often composed of subcharacter components which are also semantically informative. In this work, we propose an approach to jointly embed Chinese words as well as their characters and fine-grained subcharacter components. We use three likelihoods to evaluate whether the conte...
متن کاملBeyond Word N-Grams
We describe, analyze, and experimentally evaluate a new probabilistic model for wordsequence prediction in natural languages, based on prediction suffi~v trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian approach based on recursive priors over all possible PSTs to efficiently maintain tree mixtures. These ...
متن کاملImproved Learning of Chinese Word Embeddings with Semantic Knowledge
While previous studies show that modeling the minimum meaningbearing units (characters or morphemes) benefits learning vector representations of words, they ignore the semantic dependencies across these units when deriving word vectors. In this work, we propose to improve the learning of Chinese word embeddings by exploiting semantic knowledge. The basic idea is to take the semantic knowledge a...
متن کاملImproving Word Alignment of Rare Words with Word Embeddings
We address the problem of inducing word alignment for language pairs by developing an unsupervised model with the capability of getting applied to other generative alignment models. We approach the task by: i) proposing a new alignment model based on the IBM alignment model 1 that uses vector representation of words, and ii) examining the use of similar source words to overcome the problem of r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2019
ISSN: 2169-3536
DOI: 10.1109/access.2019.2908014